# Lightweight Transformer

Saute
MIT
SAUTE is a lightweight Transformer architecture with speaker perception ability, designed for effectively modeling multi-speaker dialogues.
Dialogue System Transformers English
S
JustinDuc
229
1
Terjman Nano V2.0
Terjman-Nano-v2.0 is a Transformer-based English-Moroccan dialect translation model with 77M parameters, optimized for high-quality and precise translation.
Machine Translation Transformers Supports Multiple Languages
T
atlasia
95
2
Spec Vision V1
MIT
Spec-Vision-V1 is a lightweight, state-of-the-art open-source multimodal model designed for deep integration of visual and textual data, supporting a 128K context length.
Text-to-Image Transformers Other
S
SVECTOR-CORPORATION
17
1
Mochi 1 Transformer 42
Apache-2.0
A distilled version of the genmoai mochi-1 model transformer, composed of 42 modules (original version has 48 modules), achieving lightweight through iterative removal of modules with the smallest MSE values
Text-to-Video English
M
NimVideo
62
3
Spam Mail Classifier
Apache-2.0
A text classification model fine-tuned based on microsoft/Multilingual-MiniLM-L12-H384, used to classify email subjects as spam (SPAM) or non-spam (NOSPAM).
Text Classification Transformers
S
Goodmotion
943
3
Segformer B0 512x1024 City 160k
Other
A lightweight semantic segmentation model based on the Segformer architecture, pre-trained on the Cityscapes dataset
Image Segmentation
S
smp-hub
44
0
Sapiens Depth 0.3b Torchscript
Sapiens is a family of vision transformers pre-trained on 300 million 1024 x 1024 resolution human images for depth estimation tasks.
3D Vision English
S
facebook
69
0
Sat 3l Sm
MIT
State-of-the-art sentence segmentation technology using a 3-layer Transformer architecture, supporting multilingual text segmentation.
Sequence Labeling Transformers Supports Multiple Languages
S
segment-any-text
168.01k
6
Sat 3l
MIT
sat-3l is a model suitable for wtpsplit, adopting a 3-layer Transformer architecture to achieve state-of-the-art sentence segmentation.
Sequence Labeling Transformers Supports Multiple Languages
S
segment-any-text
5,790
3
Meshgpt Preview
Apache-2.0
MeshGPT is a text-to-3D model based on autoencoders and Transformers, the world's first publicly available 3D model tokenizer.
3D Vision Transformers
M
MarcusLoren
254
49
Octo Small 1.5
MIT
Octo Small is a diffusion policy model for robot control, based on Transformer architecture, capable of predicting robot actions from visual inputs and language instructions.
Multimodal Fusion Transformers
O
rail-berkeley
250
6
Paraphrase MiniLM L6 V2 Finetune Summary
A sentence embedding model based on sentence-transformers that maps text to a 384-dimensional vector space, suitable for semantic search and text similarity calculation
Text Embedding Transformers
P
tonychenxyz
20
1
Sts Distilcamembert Base
MIT
This is a French sentence embedding model based on DistilCamemBERT, capable of encoding sentences or paragraphs into 768-dimensional vectors for tasks such as sentence similarity computation.
Text Embedding Transformers French
S
h4c5
48
1
Simple Stories 4M
MIT
Simple Stories is a series of small text generation models trained on the TinyStories dataset, focusing on generating children's stories.
Text Generation Transformers English
S
broskicodes
104
16
Octo Small
MIT
Octo Small is a robot control model trained based on diffusion policy, capable of predicting 7-dimensional actions for the next 4 steps, suitable for multi-source robot datasets.
Multimodal Fusion Transformers
O
rail-berkeley
335
13
Ced Base
Apache-2.0
CED is a simple audio tagging model based on ViT-Transformer, achieving state-of-the-art performance on Audioset.
Audio Classification Transformers
C
mispeech
1,318
7
T5 Translate Vietnamese Nom
MIT
A lightweight pre-trained model based on Transformer architecture, specifically designed for bidirectional translation between Vietnamese Nôm and Latin script
Machine Translation Transformers Other
T
minhtoan
17
3
Mobilevitv2 1.0 Voc Deeplabv3
Other
A semantic segmentation model based on the MobileViTv2 architecture, pre-trained on the PASCAL VOC dataset, supporting 512x512 resolution image processing
Image Segmentation Transformers
M
shehan97
1,075
0
Segformer B0 Flair One
Apache-2.0
SegFormer is an efficient semantic segmentation model based on Transformer, with the b0 version being its lightweight implementation.
Image Segmentation Transformers
S
alanoix
14
1
Internal.wav2vec2 Base Superb Ks Int8 Structured79
Apache-2.0
This model is a fine-tuned version of wav2vec2-base-ft-keyword-spotting on the superb dataset for audio classification tasks, optimized through quantization and structured pruning.
Audio Classification Transformers
I
yujiepan
16
0
Vit Small Patch16 224.dino
Apache-2.0
An image feature model based on Vision Transformer (ViT), trained using the self-supervised DINO method, suitable for image classification and feature extraction tasks.
Image Classification Transformers
V
timm
70.62k
4
T5 Small Vietnamese News
MIT
A lightweight pre-trained encoder-decoder Transformer model designed for Vietnamese news summarization
Text Generation Transformers Other
T
minhtoan
104
4
T5 Small Wikilingua Vietnamese
MIT
State-of-the-art lightweight pretrained model for Vietnamese based on Transformer encoder-decoder architecture, specialized for text summarization tasks.
Text Generation Transformers Other
T
minhtoan
43
3
Nat Mini In1k 224
MIT
NAT-Mini is a lightweight vision Transformer model based on neighborhood attention mechanism, designed for ImageNet image classification tasks
Image Classification Transformers Other
N
shi-labs
109
0
T5 Small
Apache-2.0
T5-small is a pre-trained model based on the encoder-decoder architecture, capable of handling multiple tasks through a unified text-to-text format, supporting multilingual processing.
Large Language Model Transformers Supports Multiple Languages
T
optimum
11.43k
9
Levit 128S
Apache-2.0
LeViT-128S is a vision Transformer model pretrained on the ImageNet-1k dataset, combining the advantages of convolutional networks for faster inference.
Image Classification Transformers
L
facebook
3,198
4
Levit 384
Apache-2.0
LeViT-384 is a vision Transformer model pre-trained on the ImageNet-1k dataset, combining the advantages of convolutional networks for faster inference speed.
Image Classification Transformers
L
facebook
37
0
HPD MiniLM F128
Apache-2.0
A sentence representation model for semantic retrieval compressed via homomorphic projection distillation, with only 23 million parameters and a model size of 87MB
Text Embedding Transformers
H
Xuandong
13
0
Fnet Base Finetuned Cola
Apache-2.0
A text classification model fine-tuned on the GLUE COLA dataset based on google/fnet-base, used to evaluate the performance comparison between FNet and BERT architectures
Text Classification Transformers English
F
gchhablani
15
0
Distil Eng Quora Sentence
This is a sentence embedding model based on sentence-transformers, capable of mapping sentences to a 768-dimensional vector space, suitable for tasks such as semantic similarity calculation and text clustering.
Text Embedding Transformers
D
mboth
39
1
Xtremedistil L6 H384 Uncased
MIT
XtremeDistilTransformers is a knowledge-distilled lightweight Transformer model with task-agnostic properties, applicable to various natural language processing tasks.
Large Language Model English
X
microsoft
1,854
23
Minilm L12 H384 Uncased
MIT
MiniLM is a compact and efficient pre-trained language model, compressed through deep self-attention distillation technology, suitable for language understanding and generation tasks.
Large Language Model
M
microsoft
10.19k
89
Deit Tiny Patch16 224
Apache-2.0
DeiT is an efficiently trained vision Transformer model, pretrained and fine-tuned on the ImageNet-1k dataset, suitable for image classification tasks.
Image Classification Transformers
D
facebook
29.04k
9
Multilingual MiniLM L12 H384
MIT
MiniLM is a compact and efficient pre-trained language model that compresses Transformer models through deep self-attention distillation technology, supporting multilingual understanding and generation tasks.
Large Language Model Supports Multiple Languages
M
microsoft
28.51k
83
T5 Base Chinese
This is a truncated MT5-base model with vocabulary and word embeddings optimized for Chinese and English characters, suitable for Chinese-English text processing tasks.
Large Language Model
T
lemon234071
102
16
Xtremedistil L12 H384 Uncased
MIT
XtremeDistilTransformers is a task-agnostic Transformer model distilled through task transfer learning, creating a small universal model applicable to any task and language.
Large Language Model Transformers English
X
microsoft
471
15
Xtremedistil L6 H256 Uncased
MIT
XtremeDistilTransformers is a task-agnostic distilled Transformer model that utilizes task transfer learning techniques to train small general-purpose models suitable for various tasks and languages.
Large Language Model Transformers English
X
microsoft
3,816
33
Deit Small Patch16 224
Apache-2.0
DeiT is a more efficiently trained Vision Transformer model, pre-trained and fine-tuned on the ImageNet-1k dataset at 224x224 resolution, suitable for image classification tasks.
Image Classification Transformers
D
facebook
24.53k
8
Distilroberta Base
Apache-2.0
DistilRoBERTa is a distilled version of the RoBERTa-base model with fewer parameters but faster speed, suitable for English text processing tasks.
Large Language Model English
D
distilbert
1.2M
153
Distilbert Base En Ur Cased
Apache-2.0
This is a distilled version of distilbert-base-multilingual-cased, specifically supporting English and Urdu while maintaining the original model's representation capabilities.
Large Language Model Transformers Other
D
Geotrend
32
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase